Plagiarism Detection across Distant Language Pairs
نویسندگان
چکیده
Plagiarism, the unacknowledged reuse of text, does not end at language boundaries. Cross-language plagiarism occurs if a text is translated from a fragment written in a different language and no proper citation is provided. Regardless of the change of language, the contents and, in particular, the ideas remain the same. Whereas different methods for the detection of monolingual plagiarism have been developed, less attention has been paid to the crosslanguage case. In this paper we compare two recently proposed cross-language plagiarism detection methods (CL-CNG, based on character n-grams and CL-ASA, based on statistical translation), to a novel approach to this problem, based on machine translation and monolingual similarity analysis (T+MA). We explore the effectiveness of the three approaches for less related languages. CL-CNG shows not be appropriate for this kind of language pairs, whereas T+MA performs better than the previously proposed models.
منابع مشابه
Cross-Language Plagiarism Detection Methods
The present paper provides a summary on the existing approaches to plagiarism detection in multilingual context. Our aim is to organize the available data for the further research. Considering distant language pairs is of a particular interest for us. Cross-language plagiarism detection issue has acquired pronounced importance lately, since semantic contents of a document can be easily and disc...
متن کاملDeep Investigation of Cross-Language Plagiarism Detection Methods
This paper is a deep investigation of cross-language plagiarism detection methods on a new recently introduced open dataset, which contains parallel and comparable collections of documents with multiple characteristics (different genres, languages and sizes of texts). We investigate cross-language plagiarism detection methods for 6 language pairs on 2 granularities of text units in order to dra...
متن کاملAnálisis de similitud basado en grafos: Una nueva aproximación a la detección de plagio translingüe
Cross-language variant of automatic plagiarism detection tries to detect plagiarism among documents across language pairs. In recent years a few approaches are proposed that use thesauri, alignment models or statistical dictionaries to deal with the similarity across languages. We propose a new approach to the crosslanguage plagiarism detection that makes use of a multilingual semantic network ...
متن کاملOld and new challenges in automatic plagiarism detection
Automatic methods of measuring similarity between program code and natural language text pairs have been used for many years to assist humans in detecting plagiarism. For example, over the past thirty years or so, a vast number of approaches have been proposed for detecting likely plagiarism between programs written by Computer Science students. However, more recently, approaches to identifying...
متن کاملCross-language plagiarism detection
Cross-language plagiarism detection deals with the automatic identification and extraction of plagiarism in a multilingual setting. In this setting, a suspicious document is given, and the task is to retrieve all sections from the document that originate from a large, multilingual document collection. Our contributions in this field are as follows: (i) a comprehensive retrieval process for cros...
متن کامل